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Abstract 



The mass composition of high energy cosmic rays depends on their production, acceleration, and 
propagation. The study of cosmic ray composition can therefore reveal hints of the origin of these 
particles. At the South Pole, the IceCube Neutrino Observatory is capable of measuring two com- 
ponents of cosmic ray air showers in coincidence: the electromagnetic component at high altitude 
(2835 m) using the IceTop surface array, and the muonic component above ~1 TeV using the Ice- 
Cube array. This unique detector arrangement provides an opportunity for precision measurements 
of the cosmic ray energy spectrum and composition in the region of the knee and beyond. We 
present the results of a neural network analysis technique to study the cosmic ray composition 
and the energy spectrum from 1 PeV to 30 PeV using data recorded using the 40-string/40-station 
configuration of the IceCube Neutrino Observatory. 



1. Introduction 

The flux of cosmic rays at Earth is known to follow a steep power-law spectrum over a large 
energy range. The index of this spectrum is approximately constant at energies lower than about 
3 PeV, where the spectrum steepens in a feature known as the "knee". A further kink in the 
spectrum, known as the "ankle", occurs around 1 EeV where the spectrum becomes flatter again. 
The origins of these spectral changes are still uncertain. Currently, the most popular model predicts 
cosmic ray acceleration in shock fronts via the first order Fermi mechanism [T] . More specifically, at 
energies up to ^ 10^^ eV, the source of this acceleration mechanism is often attributed to supernova 
remnants; a cut-off energy which depends upon nuclear charge (Z) of the particle accelerated at 
the source could be responsible for a mass-dependent knee; the ankle is then attributed to cosmic 
rays from extragalactic sources such as gamma-ray bursts or active galactic nuclei ||2l[3l|4]. 

Various underlying source, acceleration, and propagation models, though tuned to predict similar 
energy spectra, differ considerably in energy-dependent composition in the region between the knee 
and the ankle |5j . This dependence on primary mass implies that a precise measurement of the com- 
position of cosmic rays would provide important clues as to the origins of these particles. However, 
due to the decreasing flux of cosmic rays with increasing energy, measurements above '-^lOO TeV can 
only be obtained indirectly using ground-based detectors, as opposed to satellite or balloon-based 
detectors which provide direct measurements below 100 TeV. Indirect measurements of composition 
involve a close examination of the extensive air shower produced by a primary cosmic ray particle 
colliding with Earth's atmosphere. By using information from more than one component of the 
shower, such as the electromagnetic and muonic components, the energy and composition can be 
obtained for primary particles with much higher energies than those currently measurable through 
direct detection techniques. 

At the South Pole, the IceCube Neutrino Observatory is sensitive to air showers with energies 
from below the knee to the ankle region of the energy spectrum. This region covers the predicted 
transition from galactic to extragalactic cosmic ray sources. The IceTop surface array measures a 
combination of the electromagnetic and the low energy muonic components of the cosmic ray shower, 
while the IceCube array measures the bundle of high-energy muons (>300 GeV) deep under the 
surface of the ice. In the following analysis, data from these two components are combined for a 
coincident composition measurement. 
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2. The IceCube Neutrino Observatory 

The IceCube Neutrino Observatory consists of two parts: IceTop, a surface air shower array 
[B], and IceCube, a niuon and neutrino telescope installed deep in the ice. These detectors are 
the successors to the SPASE [7 and AMANDA fS] experiments. Each array is comprised of Hght 
sensors called Digital Optical Modules (DOMs) [9J, which detect Cherenkov photons emitted by 
relativistic charged particles passing through ice. Each DOM is a spherical, pressure-resistant glass 
shell containing a 25 cm diameter Hamamatsu photomultiplier tube (PMT), a mu-metal grid for 
magnetic shielding of the PMT, and electronics for operation and control of the PMT as well as 
amplification, digitization, filtering, and calibration. 




Figure 1: Left: A schematic of the final IceCube Neutrino Observatory layout, completed in 2011. In 2008, only the 
IceTop and IceCube arrays existed, though a low-energy in-fiU array called DeepCore has since been added. Right: 
A coincident event from the IceTop/IceCube 40-string configuration of 2008. The colors represent the timing of the 
hits (red is earliest, blue is latest), and the size of the sphere around a DOM represents the amplitude of light seen 
by that DOM. In this large event, a big dust layer can clearly be distinguished as a "waist" in the amplitudes just 
over half-way down the IceCube array. 



In IceCube, DOMs are frozen into the ice along strings which are placed in a 125 m triangular 
grid formation. The DOM's are vertically spaced 17 m apart, at depths from 1450 m to 2450 m 
below the surface, as shown in Fig.[l]ja). The direction of muons (either from cosmic ray air showers 
above the surface, or neutrino interactions within the ice or bedrock) can be reconstructed from 
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the pattern (position and timing) of hit DOMs. This feature is demonstrated in Fig.[l];b), which 
shows a coincident IceTop/IceCube event from the 40-string/40-station configuration of 2008. The 
minimum energy at the surface required for a muon to penetrate through the ice to the top of the 
IceCube array is around 300 GeV, while the threshold for the muon to pass through the whole 
detector volume is around 500 GeV. 

Each IceCube string is associated with an Ice Top "station", which consists of two cylindrical ice 
Cherenkov tanks separated by 10 m. Each tank has an inner diameter of 1.82 m, is 1.3 m high, 
and contains two down-facing DOMs, with center-to-center spacing of 58 cm, frozen in optically 
clear ice 90 cm in depth. The PMT in each DOM has an adjustable gain; to increase the dynamic 
range of the detector, one DOM in each tank is operated at low gain (LG) while its partner is 
operated at high gain (HG). The tanks are lined with a diffusely reflective coating of either Tyvek 
or, in most cases, zirconium fused polyethylene, and the surface of the ice is covered with perlite. 
IceTop measures mainly the electromagnetic component of incoming cosmic ray air showers. The 
signal response of a particle passing through an IceTop tank is measured in photoelectrons (PEs); 
however, each tank must be calibrated to obtain a uniform measurement. Therefore, the amount 
of charge deposited by mainly vertical 1-10 GeV surface muons passing through a given tank 
is defined in terms of a common unit, called a "vertical equivalent muon", or VEM. A calibration 
process is then performed by comparing the charge spectra of each DOM - which shows a clear 
muon peak - in data and simulation. The altitude of the surface array is 2835 m, which corresponds 
to an atmospheric depth of ^680 g/cm^. For showers with tens of PeV energies, this is just below 
the proton shower maximum; therefore, an excellent energy resolution is expected. 

Construction of the IceCube Neutrino Observatory began at the geographic South Pole in the 
2004-05 austral summer (TH] and was completed during the 2010-11 austral summer. This work 
used the configuration of the detectors operational from April 2008 to May 2009, which consisted of 
40 IceCube strings (2400 DOMs) and 40 IceTop stations (160 DOMs). Figure [l]; a) shows the final 
detector configuration of 86 strings, while Fig. [ijb) is an example data event from the 40-string 
configuration studied in this work. 



3. Data and Simulation 

3.1. Data 

This analysis uses data from August 2008, when the detector was in its 40-string/40-station 
configuration, for an overall detector livetime of 29.78 days. A study of the relationship between 
the electromagnetic and muonic air shower observables (if 70 and 6*125, a-s discussed in Section |4| 
throughout the full year of data from the 2008 configuration revealed significant seasonal variations 
in the relationship between the observables used for this analysis [llj. Limiting this data set to 



August 2008, a month which corresponds to our simulated data (described in Section 3.2 1, effectively 
mitigates these fluctuations. 

Local coincidence (LC) and trigger requirements are the first steps in the data acquisition system 
of the detectors [9J . The LC requirement is satisfied in IceCube when two neighboring or next-to- 
nearest-neighboring DOMs on a string pass a signal threshold of 0.25 photoelectrons (PE) within 
1 fjs. The LC requirement is satisfied in IceTop when either both HG DOMs, or a HG DOM and 
the LG DOM from the neighboring tank in the station have passed the signal threshold of about 
20 PE within 1 /is. An event has triggered a detector when a certain number of DOMs record LC 
signals within a 5 /is sliding time window: eight DOMs for IceCube, and six DOMs for IceTop. 
Additionally, a "coincident event" is one where both detectors have triggered. 
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3.2. Simulation 

Monte Carlo simulated events were produced for this analysis using the CORSIKA air shower 
generator [12J with the SIBYLL-2.1/FLUKA-2008 hadronic interaction models |I31[T1]. Five par- 
ticle species (proton, helium, oxygen, silicon, and iron) were generated with an spectrum from 
10 TeV to 50 PeV. The showers were generated uniformly over all azimuths and with a sin(6')cos(6') 
distribution in the zenith range from 0° to 65° , which is steeper than that reached by reconstructible 
coincident events. For each species, 3000 showers were simulated per third of a decade in energy. 
Each shower was oversampled 100 times over a circle with radius of 1200 m leading to a total of 16.5 
million generated events. The atmospheric model chosen corresponds to the austral winter months 
at the South Pole. For initial comparison with experimental data, the simulation is reweighted, 
independent of primary mass, to an E^^ '^ spectrum at energies below a knee at 3 PeV and an E^^'^ 
spectrum above the knee. The IceTop detector is simulated using parameterizations of the tank 
response obtained from a complete Geant4 |15l [TB] simulation. The muons are then propagated 
through the ice to the depth of the IceCube array using the muon propagator MuonMonteCarlo 
[TT which accounts for continuous and stochastic energy losses. The Cherenkov photons from the 
muon bundles passing through the volume of the array are simulated using the software package 
Photonics [TB] which models the full structure of the ice properties according to the standard ice 
model used for the 40-string configuration of IceCube (known as the AHA ice model) [T^ . This is 
followed by a simulation of each aspect of the DOM electronics and the trigger. The DOM signals 
are then processed in the same way as experimental data. 

In IceCube, the parameterization for the light yield from muon bundles in the Photonics soft- 
ware has recently been improved. This improvement revealed an energy-dependent offset between 
our data and our standard simulation. Thus, a small set of simulation was generated using the new 
software and this discrepancy was parameterized using a comparison of the two simulation sets. 
This parameterization has been applied as a correction to the experimental data from this point 
forward. 

4. Reconstruction Algorithms 

Reconstruction with IceTop 

Data collected by IceTop are analyzed to reconstruct the core position, direction and size of a 
cosmic ray air shower [20 , 21j. These parameters are determined by comparing the detected signal 
locations, charges, and times from hit stations (as well as the locations of unhit stations) to what 
is expected from a cosmic ray air shower, according to a likelihood function: 

£ = + £o + A. (1) 

where Cg is the likelihood of signal charges from a lateral distribution function, is the likelihood 
of unhit stations, and Ct is the likelihood of signal times from an expected timing profile. All three 
are described in detail in Ref. [21,, but a brief overview is given here. 

The first term, Cg , represents the likelihood of the array responding with measured signals in hit 
tanks Si, given a set of expected signals for each of those hit tanks S'f*. Signals are assumed to have 
normal distributions in logio(5'i), and are expressed in units of VEM (as discussed in Section |2|. 
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The signal expectation value, 5f' , is a function of the perpendicular distance from the shower 
axis, r, and is given by a lateral distribution function which has been derived from simulations: 



r 



-/3-fclogio(-jj^) 



S{r) - ^,cf • [j^^ j , (2) 

where k is a constant optimized from simulations, and /3 is fit as a free parameter to each event. 
The shower size, S'ref, is the signal expectation at a reference distance, R^ei, from the shower axis. 
The reference distance is chosen as the point in the fit which is the most stable or robust: multiple 
studies of the fit stability led to a choice of 125 m for this quantity [H]. Figure [2] shows an example 
of the distribution of charges for one shower from the 2008 IceTop configuration with 40 stations. 
The superimposed curve is the lateral distribution function of Eq. |2] with the best-fit iS'125 for this 
particular event. 
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Figure 2: An IceTop event from September 11, 2008. Left: The core location is denoted with a star (near station 64) 
and the direction of the shower {9 15°, rf) ^ 32°) is indicated by the arrow, and the dotted hne represents the plane 
of the incoming shower front. The timing of the hits is represented by the colors, from red (earlier) to blue (later). 
The amount of signal observed by each tank in a given station is denoted by the size of the hemisphere closest to it. 
Right: The lateral distribution of these hits, color-coded for timing in the same scheme as for the event display. The 
expected lateral distribution of signals is shown in black; 5i25 is this function evaluated at 125 m from the core of 
the shower; in this case, 12.2 VEM. 



The second likelihood term in Eq. ([ij accounts for all stations j with the tanks (j. A) and (j, B) 
that did not trigger: 

/:o=Ei°gio(i-^:A)^:s)), (3) 

where P^^X^^g-j is the probability that the "A" tank (or the "B" tank) within station j did not trigger, 

for a given signal expectation value S^^ . 

The third term of Eq. ^ describes the probability of measuring the observed signal arrival 
times, ti, given expected signal arrival times, tf*. Each arrival time is assumed to be normally- 
distributed with standard deviations ati{ri), depending on the radial distance, r^, of the tank to 
the shower core. 

Studies of experimental data reveal that the shower front is not a flat plane. Rather, the 
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expected arrival times of the signals can be described as those expected from a flat plane, modified 
by a "curvature term" which is symmetric around the shower axis and a function of radial distance. 
The expected signal arrival time at a tank with position x is thus parametrized as: 



Here, is the arrival time of the shower core at the surface of the ice (which is the surveyed level 
of the IceTop tanks), is the position of the shower core on the ground and n is the unit vector 
in the arrival direction of the shower. The last two terms in Eq. |4] (the parabola and the Gaussian) 
together are the "curvature term". The functional form of the curvature term and the values of the 
parameters a, 5, and a were obtained from a study of the time residuals in experimental data. 

Thus, in this likelihood function there are seven total possible free parameters: S'125, /3, and the 
track's core position (xcUc) ^ind direction and core time (9, cj), Iq). 

IceTop Fit Procedure. 

Events in IceTop are initially reconstructed with two fast "first guess" algorithms. First, the 
core position is estimated using the center of gravity (COG) of the tanks which were hit, weighted 
with the square root of the charges: 



Then, the direction of the shower is estimated by fitting a plane (without curvature terms) to the 
arrival times of the signals. Both first guess core position and track direction together provide 
the seed track for the more advanced lateral distribution fit involving all three likelihood terms 
described in the previous section. 

Multiple lateral distribution fits are performed in succession for best results. At each step a full 
minimization is performed. First, the shower direction is fixed and only the core position, shower 
size, and /3 are allowed to vary as free parameters. Next, since Eq. ^ can exhibit unphysical 
behavior at small values of r, a second fit is made (again with shower direction fixed) in which 
all tanks closer than 11 m to the reconstructed shower core are excluded from the fit. If an 11 m 
circle around the new core location includes tanks which were not excluded before (which happens 
rarely), they are added to the list of excluded tanks and this second step is repeated until no new 
tanks are added to the list. In a third step, the track direction is allowed to vary, as well as the core 
location and shower variables with smaller stepsizes. Finally, the track direction is fixed again and 
a final fine-tuning of just the core location and shower variables (with small stepsizes) is performed. 

4.. 2. Reconstruction with IceCuhe 

Bundles of muons that reach the IceCube detector can have muon multiplicities ranging from 1 
to more than 1000 at the highest energies, and an average lateral size of 20-30 meters. When these 
muons pass through the ice at a speed greater than the speed of light in the medium, they produce 
Cherenkov photons which can be detected by the DOMs of the IceCube array. Many techniques for 
reconstructing the direction and energy of single particle tracks exist, which vary track parameters 
until the likelihood of producing the measured amplitudes and arrival times of photons at the DOMs 
is maximized [22] . But when the source of the photons is a muon bundle rather than a single muon 
track, reconstruction can be improved with a likelihood function specific to muon bundles, which 





(5) 
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Figure 3: Left: A cartoon of a muon bundle traveling through the ice. As it reaches deeper slant depths, the muons 
in the bundle range out (where the arrows end). The intensity of Cherenkov photons emitted from the bundle, as 
a function of perpendicular distance from the track axis, is sketched as blue curves for several slant depths. This 
intensity is fit to a function of the form of Eq.|6]with the muon bundle reconstruction _23j. Right Lateral distribution 
of detected photons from an experimental event from August 2008, together with the results of the muon bundle 
reconstruction's fit. 



was developed using the prototype detectors SPASE-2 and AMANDA-BIO [H] and has been 
updated for use with IceCube. 

The expected signal for a given DOM can be described by an in-ice lateral distribution function 
(LDF), which depends upon its perpendicular distance, d, from the center of the muon bundle, as 
well as the slant depth, X, of the muon bundle at the point where it passes closest to the DOM, 
and the effective attenuation length, X^jj, of the ice surrounding the DOM. The overall expected 
signal, ^, is 

^^NN,,,,,,,{X) J_ e-''/'^ff(^^\ (6) 

where N^^^f.pth{X) is a "range-out" function which describes the number of muons as a function of 
slant depth: 

N^,depth (X) = A [(^) {e"^ - 1)] . (7) 

based on a model for continuous and stochastic energy loss of muons (parametrized by constants a 
and b respectively), combined with the muon energy spectrum index 7^. As the bundle penetrates 
deeper, correspondingly less light should be expected at the deeper DOMs as muons "range out" 
(as shown conceptually in Fig. |3(a)"| : 

The effective attenuation length Xeff{z) is a combination of both scattering and absorption 
effects in the ice. Both of these properties vary with DOM location, due to horizontal dust layers. 
Therefore, X^ff is a function of each DOM's depth in the ice, z, and is parametrized as an average. 
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Figure 4: An example of a test to see the effect of possible misreconstructions. The true track position of a single 
simulated 12.3 PeV shower was anchored at the surface while the direction was varied repeatedly within 1.1° of the 
true direction. Using a number of similar tests, the reference distance is chosen in order to have the smallest variation 
in the DOM average response: for this detector 70 m proved to be the most stable reference point; thus K^q is used 
for the analysis. 



or "bulk." Aq times a z-dependent correction factor: 



,(z) • Ao. 



(8) 



The correction factor Cicc is based on scattering length data measured at many different depths. 

For DOMs which are far from the muon bundle {d > 80 m), light is likely to have traversed mul- 
tiple dust layers before reaching the DOM. Thus, in Eq. ([6| the average effective attenuation length 
of the bulk ice, Aq, (with no dust correction) is used for these DOMs, and the normalization, TV, is 
modified by a factor chosen to ensure continuity of the function over all distances. 

Therefore, the expected signal ^ for each DOM can be computed using Eq. ([6]), which already 
includes the effect of dust layers and muon range-out according to that DOM's position. This 
expected signal is then compared with the hits recorded by the DOMs using a likelihood method, 
as in Fig. |3(b)] The overall normalization of the function, N, the average attenuation length, Aq, and 
the direction of the central track, are fitted as free parameters. Although Aq is known to be around 
25 m |TB], this length is fit as a free parameter because minor errors in the track reconstruction 
can change this fitted slope of the LDF of photons in IceCube, as shown in Fig. |4] for which the 
direction of a single simulated 12.3 PeV track was varied from the true direction by up to 1.1° to 
show the effect of possible misreconstructions. 

After the fitted track is found, a parameter is needed to quantify the overall amount of light 
(and number of muons) in the ice. Like the 5125 parameter (which is the IceTop lateral distribution 
function fit at a reference distance of 125 m), the if 70 parameter is the IceCube lateral distribution 
function of Eq. |6] evaluated at a reference slant depth of 1950 m, with a reference dust correction 
(cico = 1), and a reference perpendicular distance D^cf of 70 m: 



KiD, 



cf 



A 

V-Dref Ao 



--Drct/Ao 



(9) 
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where A (the overall normalization) and Aq (the average attenuation length) are free parameters 
which were fit. The reference distance of 70 m was chosen to provide the most stable measurement 
under errors in the reconstructed track direction. A number of tests were performed to choose this 
reference point: one example is shown in Fig. [4] where the small spread around 70 m indicates the 
stability of this choice in reference distance. 

4-.3. Full Reconstruction: Multiple Iterations 

The IceCube track reconstruction algorithm described in the previous section can clearly be 
used to estimate the multiplicity of the muon bundle, since this should scale with the overall 
normalization A of the LDF of Eq. [6j But since the track direction affects the expected ^'s at all 
the DOMs (by changing their perpendicular distances d and slant depths X), the same likelihood 
function can also be used to find the track direction. The core position and direction from the 
surface reconstruction are used as the "seed track" for the fit; the core position is anchored at the 
surface, while the direction of the track is allowed to vary in search of the best likelihood. Because 
the DOMs participating in the fit are far from where the track is fixed at the surface, a long "lever- 
arm" gives improved angular resolution. Once this IceCube reconstruction has been performed, 
the resulting fitted track can be used as an improved seed for a second iteration of the surface 
reconstruction, leading to a more accurate core location. The improved surface reconstruction is 
then passed back as a seed to the IceCube reconstruction one last time. Thus, two iterations of 
both the lateral fit reconstruction and the muon bundle reconstruction are performed (IceTop 
IceCube — IceTop — )■ IceCube) . Figure [s] shows the improvement to both the core location and 
angular reconstruction achieved through this iterative procedure. The two observables which will 
be used for composition (K-^Oi which is related to the number of muons in IceCube, and 6*125, which 
is related to the number of electrons and photons in IceTop) are also drawn from the last iteration 
of fits, where the improved resolution allows for accuracy in measuring those parameters. 

4-.4- Snow on IceTop 

During the course of this analysis, it was discovered that the presence of research buildings in 
the vicinity of the IceTop array have caused the snow to drift unevenly over the IceTop tanks. 
Some tanks are covered by more than 1 m of drifted snow, while others have a mere sprinkling: 
in particular, the part of the detector constructed first and nearest to structures (the "older half") 
tends to have a great deal more snow than the part constructed more recently (the "newer half"). 
Therefore, the signals in the tanks of the older half of the array are attenuated, and this effect is 
not present in the standard simulation used for this analysis. However, a small sample of simulated 
proton and iron showers with snow attenuation was used to study this effect and to develop a method 
of accounting for it in event reconstruction. For each tank, the signal expectation is lowered by a 
factor which depends upon the measured snow depth above each tank, as well as the snow density, 
which was measured to be about 0.4 g/cm^. Thus, a correction has been made to the data at this 
reconstruction stage so that it corresponds to the standard simulation described above. The error 
introduced by this correction will be addressed in Section |9] 

5. Event Selection 

Experimentally observed and simulated events have been selected in multiple steps in order to 
ensure a reliable reconstruction of direction and core location, to retain as many events as possible, 
and to avoid introduction of any bias by primary mass and energy: 
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Figure 5: The angular resolution (left) and distance between true and reconstructed core position (right) for simulated 
data. The "ideal limit" (black) is the maximum resolution that could be achieved by the IceCube track reconstruc- 
tion (left) and by the IceTop core reconstruction (right), obtained by using the fixed true core location and true 
direction respectively. After two iterations of both IceTop and IceCube reconstructions, the optimal core resolution 
is already found, so the direction reconstruction is also limited to two iterations. The second iteration through the 
two reconstructions clearly provides an improvement to both the angular reconstruction of the track and the shower 
core position at the surface |25| . 




Figure 6: The angular resolution (left) and core position resolution (right) between the true track and the final 
reconstructed track in proton (red dashed) and iron (blue solid) simulation after the event selection described in 
Section [5] An angular resolution of 0.5° and a core resolution of 9 m are achieved (where resolution is defined such 
that 68%, or one gaussian sigma, of all events are contained within that range). 
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Step 1: Initial filter and reconstruction requirements. For this analysis, a simple filter was 
applied requiring 5 IceTop stations and 5 IceCube DOMs for an event to pass on to the next 
stage. (Note that although 8 IceCube DOMs were required to trigger IceCube, some of those 
hits will have been removed in a cleaning phase where hits are clustered into events. This 
process can leave less than 5 DOMs in the final event.) Furthermore, for some events either 



the IceTop or IceCube reconstruction (as discussed in Section 4.3 1 fails to find a successful 
minimum. At this stage, an event is excluded if it does not have successful reconstructions in 
both IceTop and IceCube. 

Step 2: The shower and the track are reconstructed as passing within the area and volume 
of the two arrays. For an event to be considered "contained" within IceTop, the shower 
core must be reconstructed within the two-dimensional outer boundary of the array. To 
satisfy IceCube containment, the track must be reconstructed within the three-dimensional 
outer boundary of IceCube. This selection is applied using three different reconstruction 
algorithms: the final muon bundle reconstruction (which is used for the analysis), the first 
surface reconstruction (which has not been biased by any IceCube track reconstructions) , and 
an independent IceCube track reconstruction (which has not been biased by taking as input 
a surface reconstruction as a first guess) must all satisfy IceTop and IceCube containment. 
As shown in Table [T] this basic selection eliminates the largest fraction of events, but is the 
most important one for improving core reconstruction and angular resolution. 
Step 3: The final IceTop reconstruction and the final IceCube reconstruction show good di- 
rectional agreement. A disagreement as to the shower direction between the final iteration 
of the surface and IceCube track reconstructions indicates a shower for which either S'125 or 
K^o was reconstructed poorly and it is impossible to discern which, if either, is correct. The 
opening angle, A^*, between the surface reconstruction and the IceCube track reconstruction 
is conservatively required to be less than 1° . 

Step 4: The length of the track within the IceCube array must be reasonably long. High energy 
showers with small track lengths are likely to be events with a poorly reconstructed direction 
and/or core position. Thus, a requirement that a track length - estimated by calculating the 
length of the track segment between the first and last direct hit if within a -15 ns to +75 ns 
residual time window - is greater than 850 m for showers with S'i25>5.625, linearly decreasing 
to 400 m at 5125 = 0, is applied. 

Step 5: The fitted slope of the LDF function is within a reasonable range. The reconstructed 
effective propagation length, Aq, should resemble the independent measurements for Antarctic 
ice, though a range of values can still provide a robust Kjq, as described in Section [42] and 
shown in Fig. [4j As this parameter is a free parameter in the fit, it does have a slight 
dependence on the mass of the primary particle; thus, to avoid introducing a bias into the 
event sample very loose requirements are made: 

20 m < Ao < 150 m ; Ao < ^ + 40 ; Aq < + 40. (10) 

Step 6: Events containing random coincidences are removed. In experimental data, uncor- 
related events which pass either the IceTop or the IceCube detector can mimic a coincident 
air shower. Such events are not simulated. These events can be isolated by examining a 
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Event 



Number of Remaining Events 



Selection 


Data 


Simulation 


Contained 


A^- < 1° 


AR < 25 m 


Step 1 


1141590 


62954 


16601 (26.4 %) 


25490 (40.5 %) 


35313 (56.1 %) 


Step 2 


368815 


11737 


11201 (95.4 %) 


10443 (89.0 %) 


11190 (95.3 %) 


Step 3 


325685 


10959 


10484 (95.7 %) 


9993 (91.2 %) 


10532 (96.1 %) 


Step 4 


259717 


8392 


8129 (96.9 %) 


7872 (93.8 %) 


8074 (96.2 %) 


Step 5 


239902 


8122 


7868 (96.9 %) 


7694 (94.7 %) 


7815 (96.2 %) 


Step 6 


239797 


8112 


7859 (96.9 %) 


7685 (94.7 %) 


7808 (96.3 %) 



Table 1: Event passing rate at each selection step (see Section [Sjl. In this table, simulation includes proton and 
iron only, of which 6.6 million total events were generated. The event selection refers to the steps in Section [5] 
"Contained" shows the number of simulated events at each step for which the true simulated track was contained 
within the area and volume of the two arrays. At/) is the opening angle between the true and the reconstructed 
direction, and AR is the difference between the true and the reconstructed core location. In each case, at each cut 
level, the number of events is given as well as the percentage they comprise of the total number of events at that 
cut level. After all six selection levels, at least 94% of the events used for this analysis are well-contained and have 
a core resolution within 25 m and an angular resolution within 1°. 



parameter for the arrival time difference of the track hypothesis at the surface of the ice: 

c • cos(6'/c) 

where zjt is the altitude of the reconstructed core position of the shower in Ice Top, zjc is the 
point on the IceCube track closest to the center of gravity of the hits, t is the reconstructed 
arrival time of the shower at depth z and 9 is the zenith angle, as reconstructed by independent 
IceCube (IC) and IceTop (IT) reconstructions. It has been found that these uncorrelated 
events can be distinguished from the truly coincident sample by requiring At less than 3 /is. 

With this event selection, the resolution in core position is better than 9 m and in track direction 
is less than 0.5° (where resolution is defined such that 68.3% of all events are contained within that 
range) , as shown in Fig. [6] The final event sample contains 239797 events from the August 2008 
experimental data and 20289 for all five simulated primaries from the Monte Carlo simulation. 



6. Neural Network Mapping Technique 

After the event selection criteria from Section[5]have been applied, a strong relationship between 
muon and electron observables (the vertical axis -ftTyo, and the horizontal axis 5i25), and cosmic ray 
primary mass and energy (A and Eq) can be seen in Fig. [t] Black lines approximating contours 
of energy (in log]^g(i?o/GeV)) guide the eye. Mass is indicated by color; each colored bin in the 
K^Q-Sl25 space is shaded according to the percentage of proton found in that bin for simulated 
proton and iron primaries only for clarity. In the analysis intermediate species were included. The 
separation between proton and iron is clearly visible in Fig. [7] especially at the highest energies. 
However, the energy contour lines are not parallel and the proton and iron distributions are also 
not parallel, which implies that the relation between the K^q-Si25 space and the mass-energy space 
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log^^(S125/VEIVI) 



Figure 7: The Kro-Si2r, parameter space after the final event selection. The colors depict the percentage of protons 
in each bin (where for clarity only simulated proton and iron showers are shown). A bin which has more proton 
than iron will appear reddish, whereas a bin containing more iron than proton will be shaded in blue. Intermediate 
purples indicate regions of overlap. The dotted black lines depict approximate contours of constant primary energy 
and are labeled in logio(Eo). (For the analysis, all five primary types were included) 




Figure 8: This figure shows the neural network chosen, with structure 2:5:5:2. The nodes are depicted with (green) 
circles, and the black lines are the weights connecting the nodes. Weights are determined by training the network; 
thicker lines represent a stronger connection between nodes. 
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is non-linear, as the proton/iron distribution becomes distinctly more orthogonal to 6*125 higher 
energies. 

In light of the non- linear correlation between the first two-dimensional space (i^7o-5'i25) and 
the second (A-Eq), a mapping technique is required to step between them. For flexibility in testing 
combinations of multiple new input parameters under development in IceCube and IceTop, a neural 
network technique was developed. It is important to emphasize that both outputs are continuous 
in nature (as opposed to quantized); therefore, this style of neural network is referred to as either 
an interpolation or a regression problem |26| . 

A neural network consists of a set of input parameters which are connected to a set of output 
parameters through a series of nodes which are arranged in layers. Each node in a layer is connected 
to all nodes of the previous and subsequent layer via a series of weights, and is assigned an activation 
function which, for an interpolation problem, is typically a sigmoid function for the internal layers 
and a linear function for the output layer. Each node then represents the activation function acting 
on the summed, weighted input parameters. The structure used for this analysis can be found in 
Fig.[8l 

The neural network used here, shown in Fig. |8] is a feed-forward multilayer-perceptron as 
found in the ROOT TMultilayerPerceptron package ^7\. The network takes logio(S'i25) 
and Iogio(-f4r7o) as input parameters and is trained on Monte Carlo simulation data to find output 
parameters of logio(-Bo) and ln{A) [11]. Both the two inputs and the two outputs are renormalized 
so that they are numbers between and 1. 

Over the course of a number of training cycles, the network "learns" to find the best fit to the 
expected outputs. During each cycle, the error between the output parameters (obtained from the 
given network structure) and the true parameters is calculated. The weights in the network are 
updated, as prescribed by a learning algorithm, to reduce this error. The error is minimized after 
a sufficient number of learning cycles. This iterative training procedure is performed using 1/4 of 
the final Monte Carlo sample (~1000 events of each species). This "training" sample provides the 
network with ifyo and S'125 as well as the true mass and energy outputs. 

A "testing" sample, consisting of an independent set of 1/4 of the final Monte Carlo samples, 
is used to check the progress of the neural network training. After each learning cycle, the testing 
sample is also passed through the network, the output parameters are calculated and then compared 
with the true values, to verify that the network is not becoming too specific to the training sample 
(in the case of "overtraining" [53]). The testing set is also used to optimize a number of other 
network parameters, such as learning method, architecture, and activation functions. For this 
analysis a broad range of combinations were evaluated and the network which was chosen provided 
the testing sample with the best energy resolution and the best mass reconstruction for the energy 
range accessible (1 PeV- 30 PeV), within the fewest learning cycles. 

The final optimized network had the following characteristics: 

• Architecture: One method for choosing network architecture is to incorporate the fewest 
hidden layers and nodes possible to provide results of the desired accuracy Following 
this prescription, the network chosen had two hidden layers of five nodes each (for the 2:5:5:2 
setup as seen in Fig.[8|. 

• Activation Functions: The architecture is defined by nodes with sigmoid activation functions 
for the hidden layers and linear activation functions for the output layer. 

• Learning Algorithm: The quasi-Newtonian learning algorithm of Broyden, Fletcher, Goldfarb 
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and Shanno (known as BFGS) was chosen based on its efficiency in returning a desirable 
network |27| . 

• Training Cycles: Using 947 training cycles provided the smallest errors without overtraining 
for this particular structure, learning algorithm, and simulated data sample. 

To avoid biases, the training dataset was not included in the simulation data sample used for 
the analysis. Furthermore, while the testing sample was not used to train the network, it was used 
to optimize the choice of the network structure, learning algorithm and the number of learning 
cycles, and it was used to optimize the minimization process (as will be discussed in Section |8|, all 
of which could introduce a bias. Thus, the testing sample was also removed from the final sample. 
Half of the simulated data was independent of the training and testing samples (^2000 events of 
each species) and is called the "analysis" sample. This half was used for the final analysis steps 
described below. 

The energy resolution from the neural network ranges from 18 — 20% in the threshold region 
(~1.5 PeV) to 6 — 8% at 30 PeV, for an average resolution better than 14% over the full range of 
energies, with a bias which is well within the resolution, as seen in Fig. [9| The mass parameter 
output by the network is well-distributed across the full mass-space, as seen in Fig. [11] which IS 
described in Section [8j this will require a further minimization step to find the most likely mass of 
the data as a function of energy. 



7. Energy Spectrum 

At this point it is important to note that the correction described in Section [4.4| accounts for the 
signal attenuation due to the snow, but cannot account for the change in the composition-dependent 
energy threshold - which is higher in energy for the older half of the array due to the deep snow. 
Therefore, from this point forward, in both simulated and experimental data events with a shower 
core located on the older side of the array (x-position more than 200 m in Fig. 2(a)[ ) have been 



removed to minimize this effect. The number of events surviving as a function of reconstructed 
energy can be found in Table |4] 

For a determination of the energy spectrum, the data were divided into energy bins. Nine bins 
for each decade in energy are chosen so that the bin width is sufficiently larger than the resolution 
and bias in the reconstructed energy. 

The flux, ^E*, as a function of energy, Eq, is given by: 

.T,^p^ 1 logio(e) dN 



where e is Euler's constant (the base of the natural logarithm); rji^Eo) is the efficiency (Fig. 10(a)[ ) 



defined as the ratio of simulated events left after the final event selection to the number generated; 
A = 7r(1200 m)^ is the geometrical area over which the CORSIKA air showers were generated; r = 
29.78 days is the exposure time, or livetime, of the detector; and il is the solid angle weighted by 
the generated zenith distribution: 

n^2TT cos{9) sin{e)de ^TTsin^iemax), (13) 

Jo 
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Figure 9: Performance of energy reconstruction. In these figures using simulated data, Ercco is the reconstructed 
energy provided by the neural network, while Etrue is the true primary energy generated for a given air shower. Red 
(circles) mark proton and blue (squares) denote iron, (a): The true vs the reconstructed energy of the IceTop/IceCube 
40-string/40-station detector, including a yellow line representing a 1-to-l ratio to guide the eye. (b): The energy 
resolution (68 percentile deviation from the 1-to-l line), which ranges from 18—20% in the threshold region (~1.5 PeV) 
to 6 — 8% at 30 PeV. (c): The energy bias or energy misreconstruction (the shift of the peak distributions from the 
same 1-to-l line), (d): The energy resolution for the full energy range used in this analysis, for both proton and iron, 
where the mean and sigma reported are from fits of a Gaussian distribution to each curve individually. 
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where 9 is the zenith angle. In other words, the flux can be found by dividing the energy distribution 
by the detector aperture, A{Eo) = rj{EQ)A^l, and the exposure time, r. 

The efficiency can be found in Fig. |10(a) for proton and iron separately. To account for the fluc- 
tuations caused by the comparatively small statistics found in the simulated aperture distribution 
the aperture. A, is fitted to a sigmoid function modified by a line, of the form: 



A^r^AVL 



Po 



1 + g-Pl(-Bo-p2) 



{PsEq + Pa) , 



(14) 



as can be found in Fig. |10(b)[ which shows the aperture for all generated particle types. The slight 
loss of efliciency at the highest energies is an understood artifact of the minimization used in the 
event reconstruction algorithm. The resulting measured spectrum is shown in Fig. 10(c)[ where 
the minimum energy shown corresponds to the point of maximum efficiency from the fit Eq. (14 1, 



and the maximum shown is chosen to be two energy bins below the maximum energy simulated to 
avoid edge effects. 

A phenomenological evaluation was developed [5S 
was expressed as: 



], where the differential cosmic ray ffux 



1 + 


mi 







(15) 



where Z is the charge of a particle, $2 is the absolute ffux at 1 TeV, 7^ and 7c are the indices 
below and above the cut-off energy Ez, respectively, and is a parameter defining the curvature 



of the knee (where the knee becomes sharper as increases). The measured spectrum in Fig. 10(c) 
has been fit to the fiux model in Eq. (151, with the resulting power law indices of 2.62 ± 0.05 at 
energies below and 3.26 ± 0.06 above a slowly turning knee around 4.98 ± 0.37 PeV, where the 
errors given are statistical. The data points are given in Table [2] and the fit parameters are shown 
in Table [3] The systematic error bars will be discussed in Section |9] 



8. Composition 

As mentioned in Section [6] mass composition requires a further minimization step. Within 
each slice in energy and for each simulated species, the neural network produces a distribution of 
mass outputs which are referred to as "template histograms". Examples of template histograms 



for all five species are shown in Fig. 11(a) right), for a particular slice in energy selected from 
Fig. 11 (a) [ left). Experimental data, when passed through the same neural network, also has a 



histogram of mass outputs which can be decomposed into a linear combination of the template 
histograms of the individual nuclear species (proton, oxygen, etc.). A minimizer finds the optimal 
mixture of simulated species to match the experimental data. 

Note that the distributions of the templates in Fig. ll(a)[ right) very rarely approach either 



or 1 on the output axis: this is due to the overlap in the distributions, as shown in violet in Fig. [7] 
The type output from the network can be seen as a probability of being a certain species of particle: 
as the separation between species is never complete the probability that a particle is any given type 
will never be 1. Furthermore, the higher mass particles have much more overlap between types, 
thereby causing probability of reconstructing an iron particle to be slightly lower. 

To find the best mixture of species, the contribution of each species is quantified as a fraction, 
with the constraints that each fraction must be between and 1, and the fractions of all nuclei 
must sum to unity. The template histograms C for each species are weighted according to these 
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Figure 10: Upper left: The efRciency (as defined in the text) with respect to the true energy, where (red) circles 
mark proton and (blue) squares denote iron. The efficiency is much smaller than 1: this is a combined effect of the 
requirement of coincidence between the two detectors, and the large radius within which the showers are generated. 
Upper right: The aperture with respect to the energy reconstructed by the neural network for all simulated data 
(all particle types), fitted as described in Eq. ( |14[ |. Lower. The reconstructed energy spectrum for the August 2008 
data, which uses the aperture shown above. The additional symbols show alternate results from simulations using 
alternate models, as discussed in Section [9] 
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Table 2: The all-particle cosmic ray energy spectrum measured by the IceTop-IceCube 40-string/40-station arrays 
assuming hadronic interaction model SIBYLL-2.1. 



Parameter 



Best Fit 



$1 / lO^W^s-isr-i 
Eq I PeV 

7c 



1.81 ±0.65 
4.75 ±0.59 
-2.61 ±0.07 
-3.23 ±0.09 
5.59 ±3.81 
4.95/9 



Table 3: Fit parameters for the cosmic ray energy spectrum shown in Fig. |10(c)| using Eq. ( |15| 
chi-squared of the fit. Statistical errors only are shown. 



as well as the 
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Energy Range 


Experimental 


Monte Carlo Simulation 




(logio(E„co /GeV)) 


Data 


Protons 


Helium 


Oxygen 


Iron 


Q a K 


loooo 


150 


134 


150 




6.5-6.7 


8450 


165 


142 


132 


133 


6.7-6.9 


3461 


125 


174 


134 


151 


6.9-7.1 


1214 


121 


150 


130 


158 


7.1-7.3 


427 


148 


144 


147 


152 


7.3-7.5 


150 


112 


124 


151 


143 



Table 4: Number of events of experimental data and simulation, for the six energy ranges analyzed for composition, 
after application of all cuts (including the selection for the newer half of the array). 



fractions and summed to produce a histogram of their mixture Cmix- Two species "A" and "B" (i.e. 
proton and iron only, as shown in Fig. |ll(b)] ), are mixed using: 

Cr^. = {.fA)CA + il- fA)CB, (16) 

where /a is the fraction of the mixture which is species A. Expanding to three species "A", "B", 
and "C", the mixture histogram is formed by: 

Cmix = (/a)Ca + (1 - /a) + (1 - /s)Cc] . (17) 

Here, /a is the fraction which is species A, fs is the fraction of what remains (which is not A) 
which is of species B. Mixtures of four or more species can be constructed with similar logic. A 
minimizer can vary Ja, /b, etc. as free parameters. Constructing the mixture in this way makes 
it easy to enforce in the minimizer that all fractions are physical (that is, between and 1, and 
properly normalized). 

The mixture of species that best matches the data is found using a test and the software 
package MINUIT |3D], where: 

all bins ^ 

X^^ ^ "'"'a"""^ ■ (18) 

j^l ^data,j 

Here, Cdata is the measured data histogram, as reconstructed by the neural network output, and 
'''data variance in the data. Cmix is a mixture of simulated species as described above. An 

example of a best-fit mixture of template histograms for a slice in energy using only two species is 
shown in Fig. ll(b)| where the "testing" sample from Section [6] (including all species) was treated 



as though it were data, and compared to template histograms of proton and iron only (for easier 
viewing) . 

Simulated nuclei from the testing sample (described in Section [6| were "hand- mixed" in different 
proportions at different energies to test the effectiveness of various minimization strategies, ranging 
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Figure 11; Upper left: the mass output from the neural network is shown as RMS profiles in slices of the reconstructed 
energy with colors corresponding to the different species, as labeled. This plot is analogous to Fig. [Tjafter the neural 
network transformation to the new axes; therefore, the large overlap at the lowest energies corresponds to the similarly 
large overlap in Fig. [7| Lower right: The underlying distributions, or template histograms, for a single slice in energy 
for all 5 particle species. Lower: An example of template histogram fitting: the testing simulation, or "test data", 
sample (black with statistical error bars), which includes all five species, was fit to the proton and iron template 
histograms only - for easier viewing - using the minimization technique described in the text. The grey "mixing 
ratio", or combination of proton and iron comprising the test data, matches the data quite well even though only 
two template histograms were used to fit a sample containing five species. 
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Energy Range Reconstructed 



(logio(Ercco /GeV)) 


{\nA)± stat. ± sys. 


6.3-6.5 


2. 17 ±0.03 


6.5-6.7 


2.28 ±0.06 tg j^ 


6.7-6.9 


2.50 ±0.09 ig j!^ 


6.9-7.1 


2.86 ±0.14 +°f^ 


7.1-7.3 


3.36 ±0.16 tS^Ja 


7.3-7.5 


3.48 ± 0.23 to j^l 



Table 5: (In A) for six analyzed energy ranges. The subscripts (superscripts) are the minimum (maximum) alternate 
(In A) values, found by passing data (with K^o and 5125 adjusted according to alternate models) through the neural 
network and the minimization procedure. 



from using only the two most extreme template histograms (protons and iron) with only one free 
parameter, to using all five template histograms with four free parameters. With too few nuclei, 
the minimizer is confident about the best mixture, but the mixture is not a very good fit. With 
too many, the minimizer can find many options for achieving a good fit (many of them unphysical 
mixtures, such as iron and oxygen present with no silicon in between), so it is less confident about 
the best one. Using three template histograms: protons, "intermediate nuclei" (a 50-50 mixture of 
helium and oxygen), and iron, is a balanced approach: 

= fpCp + (1 - fp) [{ffe)Cfe + (1 - //e)(0.5 • Co + 0.5 • Che)] • (19) 

A list of the number of events for each species as a function of energy is given in Table [4j The mass 
output distributions with respect to energy are shown for the final simulated "analysis" sample in 



Fig. |12(a) The results of a check of this method using a mixed composition sub-set of the "testing" 
sample of simulated data are shown in Fig. |12(b)] where the true mass value (gold bars) is compared 
with the reconstructed mass value (red stars). This minimization technique reconstructs the true 



mean logarithmic mass within the error bars for each slice in energy. Furthermore, Fig. 12(c) and 
Fig. |12(d)| show this technique applied to pure proton and iron subsets of the "testing" sample 
of simulated data, respectively, to demonstrate the capability of the method to reconstruct both 
extremes. As discussed above, as the probability of finding an iron is never 1, the iron reconstruction 
never quite reaches the expected value but remains within statistical error bars. 

The composition results of this combined neural network and minimization technique applied to 
the August 2008 data are shown in Fig.[T3j with data points and statistical error bars in red. The 
systematic errors (shown as alternate symbols) are discussed below and are summarized in Table [s] 

9. Systematic Uncertainties 

This measurement of cosmic ray composition and energy spectrum depends crucially on the 
two independent observables K'jq and 6*125. Systematic errors, particularly in the scaling of one 
observable (independent of the other) can make the composition seem heavier or lighter. To estimate 
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Figure 12: Performance of mass reconstruction, (a): profiles of mass output vs reconstructed energy for the chosen 
simulated species used in the minimization: proton, helium/oxygen mixture, and iron. The parameter space is well- 
covered by these distributions, as demonstrated by the overlap, (b): The result of a check of the 3- type (p/He-O/Fe) 
minimization scheme for all energies using a sub-set of the "testing" sample of simulated data. The true {In A) for 
each slice in energy is marked in gold, while the (In A) chosen by the minimizer as the best match for the hand-mixed 
simulation is shown in red, with statistical error bars. Even with this small testing sample the minimizer finds a very 
good fit to the input values at energies where the detector is fully efficient (i.e. above 6.2 in logio(Ereco)). Similarly 
(c) and (d) show the ability of this technique to reconstruct pure proton and pure iron samples respectively at the 
proper mass. 
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Figure 13: The mean logarithmic mass vs primary energy for August 2008 data as calculated using the analysis 
technique described in this work. The error bars on the data points are statistical, while the additional symbols 
indicate results after systematic shifts estimated using alternate models. 



the magnitude of potential systematic errors and their effect on the measurement of composition, 
the K-jQ-Si2i, parameter space (as depicted in Fig. |7]) was studied using samples of simulated events 
with alternate models, listed below. 

• Hadronic Interaction Model: As the forward-direction physics of the first interaction of high- 
energy cosmic rays has not yet been measured by accelerator experiments on Earth, cosmic 
ray physicists depend upon extrapolations from the known physics at lower energies. There 
are, therefore, a number of different high-energy hadronic interaction models. Model used: 
SIBYLL-2.1 Hg. Alternate models: EPOS-1.99 [31_ and QGSJET-II-03 

• Optical Properties of the Ice: For the 40-string configuration of IceCube there were two models 
available for measuring the depth and wavelength-dependence of the scattering and absorption 
in the ice. In general, the absorption lengths vary from 100 m to 200 m and effective scattering 
lengths range from 20 m to 70 m due to the concentration of dust in the glacial ice, which has 
a strong depth-dependence |19 [ [33 | [34]. Model used: AHA [19], the standard ice model for 
the 40-string configuration of IceCube and was developed using data from the predecessor of 
IceCube (AMANDA). Alternate model: SPICE-1, which measured the ice properties over the 
full depth range of IceCube using the in-situ LEDs present on every DOM main board |33| . 

• DOM Efficiency: The efficiency of the DOMs in the ice depends upon a number of factors 
including the PMT quantum efficiency, the transmission of the glass and the gel, and the 
refrozen ice from the deployment hole (which is less clear than the bulk ice). Measurements 
of the DOM efficiency in a controlled setting have revealed an estimated uncertainty of ap- 
proximately 8% j33, which is equivalent to a shift in logio(iC7o) of 0.035 in logio(iC7o). This 
was treated as a constant systematic uncertainty across all values of S'125 which could either 
raise or lower the measured Kjq in IceCube. IceTop is not affected because its signals are 
calibrated to Vertical Equivalent Muons, as described in Section [2] 
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• Snow correction in reconstruction: As discussed in Section [4. 4[ snow accumulation at the sur- 
face can affect signals in IceTop, and reconstruction methods can correct for this effect in data. 
However, the correction applied during reconstruction will not work perfectly for all events 
at all energies. Potential systematic errors in 5125 due to imprecision in the technique itself 
were studied by comparing simulations in which snow was "removed" during reconstruction 
to simulations in which there was no snow at all. 

• Atmosphere: effect on high-energy muons: Atmospheric conditions high in the atmosphere, 
and the effective temperature of the atmosphere, affect the number of high-energy muons and 
K-jQ. With the selection of one month of data for this analysis, the effective temperature was 
stable; K70 exhibits negligible daily or weekly variation within that month. In addition, the 
atmospheric profile high in the atmosphere was compared to that simulated by CORSIKA, 
and the systematic error due to this mismatch is also negligible compared to other systematic 
effects on Kjq discussed in this section. 

• Atmosphere: effect on electromagnetic surface component: Atmospheric conditions near the 
surface (such as pressure or density) affect the electromagnetic component measured by S'125. 
The actual average atmospheric conditions during August 2008 differs from that simulated in 
CORSIKA. The systematic effect on S'125 due to this mismatch was estimated by comparing 
CORSIKA showers (10 PeV and 10 degree zenith angle) made using the standard South Pole 
winter atmosphere profile to those made using a customized profile built from measurements 
taken during August 2008. Distributions of logiQ(iS'125) differ by less than 0.01, and this 
potential shift was treated as constant across all energies. Understanding these effects in 
IceTop and IceCube is the subject of ongoing study. 

Some of these effects (such as ice model and DOM efficiency) affect only the in-ice measurement 
of -ftTro, while others (such as snow removal and atmospheric pressure) affect only the surface 
measurement of S'125. The hadronic interaction model potentially affects both. A summary of how 
if 70 and S125 are observed to shift when alternate models are explored is shown in Fig. [14] on the 
left and right, respectively. Potential shifts in logio(-ftr7o) are shown as a function of logio(Si25), 
while potential shifts in logio(Si25) are shown as a function of logio(/'ir7o). 

To explore how the final results (composition and spectrum) would change under an alternate 
model, values of either logio(if7o) or logio (S'125) in the data were artificially shifted according to 
the estimated curves in Fig. [l4] (For alternate hadronic interaction models, events were shifted 
half in K-j^ and half in S125.) For each alternate model, these shifted data sets were then run 
through the full analysis machinery (neural network reconstruction of A and Eq and minimization 
with template histograms) to yield "alternate" composition and energy spectrum results. Although 
primitive, this procedure allowed for exploration of many alternate models quickly, without having 
to generate large sets of simulated data for each one. 



Figure 13 shows the composition result (with standard simulations) together with these alternate 
results, after applying a shift either to A'70 or to S'125 (or both). These results, including errors, are 
also summarized numerically in Table [5] The "systematic errors" are drawn from the maximum 



and minimum (In^)yielded by any of the alternate models at each energy. Figure 10(c) shows a 
similar collection of alternate results for the energy spectrum. The numeric values are given in 
Tabled 
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Figure 14: Left: Estimates of systematic uncertainties affecting the in-ice measurement (as shifts in logio-R'70 for 
a given logio>Si25)- Right: Estimates of systematic uncertainties affecting the surface measurement (as shifts in 
logloSi25 for a given logioXTo)- 



10. Results and Discussion 

Data from the combined IceTop and IceCube detectors in their 2008 40-string/40-station con- 
figuration have been used to develop a technique to measure the cosmic ray energy spectrum and 
composition at energies between 1 PeV and 30 PeV. Using measurements of the electromagnetic 
component of the air shower at the surface (5*125) and the muonic component of the air shower 
in the ice {K^o), a neural network in conjunction with a minimization algorithm have been 
employed to extract the relevant physical parameters: (In A) and Eq. 

This technique provided an energy resolution better than 14% with small offset due to primary 
mass, which led to an energy spectrum that compares favorably with other recent measurements 
shown in Fig. |15[ In these figures, the shaded (pink) region indicate the systematic errors for this 



analysis. Comparisons are made to other experiments (with statistical error bars only) in Fig. 15(a) 
Figure [15(b)] highhghts a comparison (including systematic errors) to two spectra measured by the 
26-station configuration of IceTop only (IT-26) |121|, one assuming protons and one assuming a 
two-component model. This work (using a coincidence detector and a neural network based on 
both surface and deep observables) and the IT-26 work (using the IceTop surface detector only and 
an unfolding technique) yield slightly different spectra, but are consistent within systematic errors. 



When fit to the flux model in Eq. (151, the spectrum presented here indicates a power law of indices 
2.61 ± 0.07 below and 3.23 ± 0.09 above a slowly turning knee around 4.75 ± 0.59 PeV, where the 
errors given are statistical. 

This method also provided a reconstructed mass parameter and, in turn, a measurement of 
cosmic ray composition: the mean logarithmic mass, {In A). The mass composition is compared 
with other experimental measurements in Fig. [16] where, additionally, systematic error bars are 
shown for IceCube/IceTop. In this analysis, using the 40-string/40-station configuration of IceTop 
and IceCube from 2008, a mass is observed which similar to some other measured results (especially 
those using a similar electons vs. muons stretegy as in this work), and in disagreement with others 
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Figure 15: The all-particle cosmic ray spectrum found using data from 40-station IceTop/IceCube as described in 
the text. For this analysis the results are shown as (red) stars, with statistical error bars as solid (red) lines and 
systematic errors as the shaded (pink) region. Upper: this result compared to other experiments with their statistical 
errors only. Lower, this result compared to two spectra from IceTop-26, with systematic errors also shown. Fluxes 
are multiplied by _E^'^, and the scales of the two plots are the same. The overall flux and shape found in this work 
are very similar to those reported by other experiments. Data points from |21| . 



29 



< 



5r 
4 
3 
2 
1 




optical measurements 



TUNKA 
YAKUTSK 
HIRES/MIA 
CASA-BLANCA 
DICE (venus) 

HEGRA/AIROBICC (tgsjet) 



direct measurements 
• JACEE 
■ RUNJOB 




e/m measurements 



TIBET 
KASCADE 
GRAPES-3 
CASA-MIA (qgsjet) 
EASTOP (qgsjet) 
HEGRA/CRT (qgsjet) 



10" 



^ spaSe/amanda-bio 

* lceGube/lceTop-40, this work 
K\\1 lceCube/lceTop-40, sys. errors 

10^ 10' 
E/GeV 



,8 



Figure 16: The mean logarithmic mass vs primary energy for a number of experiments, as labeled. The optical and 
e/m measurements used the SIBYLL hadronic interaction model unless noted in parentheses. The IceTop/IceCube 
40-string/40-station results are shown in (red) stars, with solid (red) error bars indicating the statistical errors, while 
the shaded red region represents the systematic errors. The data indicate an increasing mass through the knee. 
The results of this analysis are similar to measu red results from some other experiments (in particular, most e/m 
experiments) as well as the flux model from Eq. ( |15[ l, but dissimilar to others (in particular, optical measurements). 
Data points compiled from | 28l I36| . 
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(especially the optical measurements, which emply a very different strategy). The slope of the strong 
increase in mass through the knee region is nearly identical to the model in Eq. (15 1. Additionally, 
this new analysis technique with IceCube-40/IceTop-40 shows results consistent within systematic 
errors to those from a different technique applied to data from its predecessor, SPASE-2/ AMANDA- 
BIO IMI- 

In the future this technique can be expanded to include new composition-sensitive input param- 
eters, as well as employ the larger IceCube detector and a greater quantity of data. From looking 
at several simulated alternate models, it is clear that systematic uncertainties (both in the in-ice 
measurement and surface measurement) can greatly affect the measured composition and spectrum, 
regardless of the detector's size or livetime. Although the mean logarithmic mass itself is difficult 
to measure absolutely because of systematics, this work shows an unmistakable trend of increasing 
mass, regardless of the differing absolute scale of (In A) measured with respect to different models. 
The systematic studies performed for this analysis have led to many improvements which will allow 
for more precise measurements with new data. Therefore, in the coming years the complete IceCube 
Neutrino Observatory will be the only detector of its kind able to provide both composition and 
energy spectrum measurements from energies overlapping with direct measurements below the knee 
to energies nearing the ankle. 
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